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Abstract 

In this paper we consider a general problem set-up for a wide class of convex and robust distributed optimization 
problems in peer-to-peer networks. In this set-up convex constraint sets are distributed to the network processors who 
have to compute the optimizer of a linear cost function subject to the constraints. We propose a novel fully distributed 
algorithm, named cutting-plane consensus, to solve the problem, based on an outer polyhedral approximation of 
the constraint sets. Processors running the algorithm compute and exchange linear approximations of their locally 
feasible sets. Independently of the number of processors in the network, each processor stores only a small number 
of linear constraints, making the algorithm scalable to large networks. The cutting-plane consensus algorithm is 
presented and analyzed for the general framework. Specifically, we prove that all processors running the algorithm 
agree on an optimizer of the global problem, and that the algorithm is tolerant to node and link failures as long as 
network connectivity is preserved. Then, the cutting plane consensus algorithm is specified to three different classes 
of distributed optimization problems, namely (i) inequality constrained problems, (ii) robust optimization problems, 
and (iii) almost separable optimization problems with separable objective functions and coupling constraints. For 
each one of these problem classes we solve a concrete problem that can be expressed in that framework and present 
computational results. That is, we show how to solve: position estimation in wireless sensor networks, a distributed 
robust linear program and, a distributed microgrid control problem. 

I. Introduction 

The ability to solve optimization problems by local data exchange between identical processors with 
small computation and communication capabilities is a fundamental prerequisite for numerous decision and 
control systems. Algorithms for such distributed systems have to work within the following specifications 
JH. All processors running the algorithm are exactly identical and each processor has only a small memory 
available. The data assigned by the algorithm to a processor should be independent of the overall network 
size or only slowly growing with the degree of the processor node in the network. None of the processors 
has global information or can solve the problem independently. 

This paper addresses a class of optimization problems in distributed processor networks with asyn- 
chronous communication. Distributed, or peer-to-peer, optimization is related to parallel [2] or large-scale 
optimization [3], but has to meet further requirements, such as asynchronous communication and lack 
of shared memory or coordination units. Distributed optimization has gained significant attention in the 
last years. Initially major attention was given to asynchronous distributed subgradient methods flU, 0. 
Asynchronous distributed primal and dual subgradient algorithms are important tools in network utility 
maximization and have been intensively studied from a communication networks perspective, see O, 0. 

Combined with projection operations, subgradient methods can also solve constrained optimization 
problems jH, fl6[. In the last years, the research scope has been widened and now several different 
distributed algorithms are explored, each suited for particular optimization problems. Distributed Newton 
methods are proposed for Network Utility Maximization [0, ifTOll . or unconstrained strongly convex 
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problems |fTTfl. Distributed variants of Alternating Direction Method of Multipliers (ADMM) have been 
proposed for distributed estimation lfT2ll . and in the wider context of machine learning [fT3l . The ADMM 
shows often a good convergence rate. However, one structural difference between ADMM and distributed 
algorithms such as subgradient methods or the novel method proposed in this paper has to be emphasized. 
In a centralized implementation ADMM requires a coordination step. Distributed ADMM replaces this 
central coordination with a consensus algorithm. However, this requires a synchronization between all 
processors in the network, i.e., all processors have to switch synchronously between local computations 
and the consensus algorithm. Fully distributed algorithms, as the one proposed here, work asynchronously 
and every processor can switch between local computations and communications at its own pace. 

An alternative research direction was established in |fl4j and [fT5l . where distributed abstract optimization 
problems were considered. A similar approach was explored in lfT6l . ifPTIl for a distributed simplex 
algorithm that solves degenerate linear programs and multi-agent assignment problems. Some results 
on the use of distributed cutting-plane methods for robust optimization have been presented in lfT8ll . The 
results of lfT8l are presented in the present paper in the wider context of general distributed optimization 
using cutting-plane methods. 

The contributions of this paper are as follows. Motivated by several important applications, we 
consider a general distributed optimization framework in which each processor has knowledge of a convex 
constraint set and a linear cost function has to be optimized over the intersection of these constraint sets. 
It is worth noting that linearity of the cost function is not a limitation and that strict convexity of the 
optimization problem is not required. A novel fully distributed algorithm named cutting plane consensus is 
proposed to solve this class of distributed problems. The algorithm uses a polyhedral outer-approximation 
of the constraint set. Processors performing the algorithm generate and exchange a small and fixed number 
of linear constraints, which provide a polyhedral approximation of the original optimization problem. 
Then, each processor updates its local estimate of the globally optimal solution as the minimal 2-norm 
solution of the approximate optimization problem. We prove the correctness of the algorithm in the 
sense that all processor asymptotically agree on a globally optimal solution. We show that the proposed 
algorithm satisfies all requirements of peer-to-peer processor networks. In particular, it requires only a 
strictly bounded local memory and the communication is allowed to be asynchronous. We also prove that 
the algorithm has an inherent tolerance against the failure of single processors. 

To highlight the generality of the proposed polyhedral approximation method, we show how it can solve 
three different representations of the general distributed convex program. First, we consider constraint 
sets defined by nominal convex inequality constraints. Second, we discuss the method for a class of 
uncertain or semi-infinite constraints. We show that the novel algorithm is capable of computing robust 
solutions to uncertain problems in peer-to-peer networks. Finally, we show that almost separable convex 
programs, i.e., convex optimization problems with separable objective functions and coupling constraints, 
can be formulated in the general framework when their dual representation is considered. Applied to this 
problem class the Cutting-Plane Consensus algorithm can be seen as a fully distributed version of the 
classical Dantzig-Wolfe decomposition, or column generation method, with no central coordinating master 
program. 

The general algorithm derived in the paper applies directly to each of the three problem classes, 
and all convergence guarantees remain valid. We present for each problem class a relevant decision 
problem, which can be solved by the novel algorithm. In particular, it is shown that localization in sensor 
networks, robust linear programming and distributed control of microgrids can be solved by the algorithm. 
Additionally, computational studies are presented which show that the novel algorithm has an advantageous 
time complexity. 

Relation to other optimization methods: The general problem formulation of this paper is similar to 
the formulation considered in [8J. However, while the approach in [8 J requires a projection operation, which 
might be computationally expensive for some constraint classes, our approach requires only the knowledge 
of a polyhedral approximation. Additionally, our method works on general time-varying directed graphs, 
and does not require a balanced communication. 
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The almost separable optimization problem setup studied in Section VII| is the classical setup for large 
scale optimization. Dual decomposition methods decompose these problems into a master program and 
several subproblems. Cutting-plane methods can be used to solve the master program, leading to algorithms 
that originate in the classical Dantzig-Wolfe decomposition |fl9ll , 0). The algorithm we propose differs 
significantly from classical decomposition methods. Indeed, our algorithm performs in asynchronous peer- 
to-peer networks with identical processors, without any central or coordinating master program. 

A distributed ADMM implementation Ifl3ll uses an average consensus algorithm to replace the update 
of the central master program. While this allows to perform all computations decentralized, it requires 
a synchronization between the processors. Additionally, preforming a consensus algorithm repeatedly 
might require many communication steps between the processors. In contrast, our method requires neither 
synchronized communication nor a repeated averaging. 

The remainder of the paper is organized as follows. The optimization problem and the processor 
network model are introduced in Section [IIJ In Section III the ideas of polyhedral outer-approximation 
and minimal norm linear programming are reviewed. The main contribution of this paper, the Cutting- 
Plane Consensus algorithm, is presented in a general form in Section IV where also the correctness of 
the algorithm and its fault-tolerance are proven. The application of the algorithm to inequality constrained 
problems and to a localization problem in sensor networks is presented in Section IV} In Section [VT 



it is shown how the algorithm can be used to solve distributed robust optimization problems, and a 
computational study is presented, which compares the completion time of the novel algorithm to an 
ADMM algorithm. The application of the Cutting-Plane Consensus algorithm to almost separable con- 
vex optimization problems and to distributed microgrid control is discussed in Section |VII[ Finally, a 



concluding discussion is given in Section VIII 



II. Problem Formulation and Network Model 

We consider a set of processors V = {1, . . . , n}, each equipped with communication and computation 
capabilities. Each processor i has knowledge of a convex and closed constraint set Zi C W 1 . The processors 
have to agree on a decision vector z E M. d maximizing a linear objective over the intersection of all sets 
Zi. That is, the processors have to solve the distributed convex optimization problem 

maximize c T z 

subject to z E O Zi. ^ 

i=l 

We denote the feasible set in the following as Z := HILi Zi. We assume that Z is non-empty and that 
([T]) has a finite optimal solution. 

The communication between the processors is modeled by a directed graph (digraph) Q c = (V,E), 
named communication graph. The node set V = {1, ...,n} is the set of processor identifiers, and 
the edge set E C {l,...,n} 2 characterizes the communication among the processors. If the edge-set 
does not change over time, the graph is called static otherwise it is called time-varying. We model the 
communication with time-varying digraphs of the form Q c {t) = (V,E(t)), where t E N represents a 
slotted universal time. A graph Q c {t) models the communication in the sense that at time t there is an 
edge from node i to node j if and only if processor i transmits information to processor j at time t. The 
time-varying set of outgoing (incoming) neighbors of node i at time t, i.e., the set of nodes to (from) 
which there are edges from (to) i at time t, is denoted by J\fo{i,t) (A/}(i,t)). In a static directed graph, 
the minimum number of edges between node i and j is called the distance from i to j and is denoted 
by dist(i, j). The maximum dist(z, j) taken over all pairs is the diameter of the graph Q c and is 
denoted by diam((/ c ). A static digraph is said to be strongly connected if for every pair of nodes 
there exists a path of directed edges that goes from i to j. For the time-varying communication graph we 
rely on the concept of a jointly strongly connected graph. 
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Assumption 2.1 (Joint Strong Connectivity): For every time instant t EN, the union digraph G%°{t) : = 
\J^L t Q c (r) is strongly connected. □ 

In this paper we develop a distributed, asynchronous algorithm solving problem ([T]) according to the 
network model described above. Each processor stores a small set of data and transmits at each time 
instant these data to its out-neighbors Afo{h t). It is worth noting that in general it is impossible to encode 
the convex set Z{ with finite data. Thus, the information about the sets Zi cannot be explicitly exchanged 
among the processors. 



III. Polyhedral Approximation and Minimal Norm Linear Programming 

We start recalling some important concepts form convex and linear optimization. We will work in the 
following with half-spaces of the form h := {z : a T z — b < 0}, where oGR^ and b E K. A half-space 
is called a cutting-plane if it satisfies the following properties. Given a closed convex set S cR. d and a 
query point z q ^ «S, a cutting-plane h(z q ) separates z q from S, i.e., a(z q ) ^ and 

a 1 \z q )z <b(z q ) for all z E S, and a T (z q )z q — b(z q ) = s(z q ) > 0. (2) 

The concept of cutting-planes leads to the first algorithmic primitive, the cutting-plane oracle. 

Cutting-Plane Oracle ORC(z g , S): queried at z q E R d for the set S. If (i) z q S then it returns 
a cutting-plane h(z q ), separating z q and S, otherwise (ii) it asserts that z q E S and returns an 
empty h. 

We make the following assumption on the cutting plane oracle, following the general cutting-plane 
framework of ll2~0l . 

Assumption 3.1: The cutting-plane oracle ORC(z q ,S) is such that (i) ||a(z g )||2 < oo and (ii) z q (t) — > z 
and s(z q (t)) — > implies that z E S. 

Note that this assumption is not very restrictive and holds for many important problem formulations. In 
fact, we discuss three important problem classes for which the assumption holds. 

Given a collection of cutting -planes H = U™ =1 hk, the polyhedron induced by these cutting -planes 
is H = {z : AjfZ < b H }, with the matrix A H E M. dxm as A H = [ai,...,a m ], and the vector b H = 
[&i,.. . ,b m ] T . 

Remark 3.2 (Cutting plane notation): We refer to both a half-space h and the data inducing the half 
space with a small italic letter. A collection of cutting-planes is denoted with italic capital letters, e.g., H = 
IJfcli hk- For a collection of cutting-planes, we denote the induced polyhedron with capital calligraphic 
letters, e.g., "H. Please note the following notational aspect. A collection of cutting-planes B that is a 
subset of the cutting-planes contained in H is denoted as B C H, while the induced polyhedra satisfy 
B^U. □ 

Assume that each cutting-plane hi is generated as a separating hyperplane for some set Zj, and let H 
be a collection of cutting -planes. The linear approximate program 

max c T z s.t. A^z < bn (3) 

z 

is then a relaxation of the original optimization problem ([T]) since the polyhedron "H = {z : A T H z < bn} is 
an outer approximation of the original constraint set Z = f]" =1 Zj. We denote in the following the optimal 
value of (|3]) as 7//, i.e., ^ H '■= max z6W c T z. The linear program ([3]) has in general several optimizers, and 
we denote the set of all optimizers of ([3]) with 

T H := {z E H : c T z > c T v, E U}. (4) 

It is a standard result in linear programming that is always a polyhedral set. We consider throughout 
the paper the unique optimal solution to ([3]) which has the minimal 2-norm, i.e., we aim to compute 

z* H = arg min ||z||2- (5) 
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Finding a minimal norm solution to a linear program is a classical problem and various solution methods 
are proposed in the literature. Starting from the early reference I121TI research on this topic is still actively 
pursued [22]. The minimal 2-norm solution can be efficiently computed as the solution of a quadratic 
program. 

Proposition 3.3: Let u* G IR |H| , a* G E, I* G R d be the optimal solution to 

min -(A H u + ac) T (A H u + ac) + bJjU + c T l 

u,a,l 2 (6) 

s.t. A T H l -ab>0, u>0 

then z* H = —A H u* — a*c solves <|5]). □ 
The proof of this result is presented in Appendix A. The minimal 2-norm solution has the important 
property that it always maximizes a strongly concave cost function. 

Lemma 3.4: Let a set of cutting-planes define the polyhedron H and let z* H be the minimal 2-norm 
solution to (|3]). Consider the quadratically perturbed linear objective 

J e (z) = c T z - -\\z\\l 

parametrized with a constant e > 0. Then there exists a e > such that for any e G [0, e] 

z* H = argmax J e (z). (7) 

□ 

The proof of this result is very similar to the classical proof presented in [|23l . However, since the 
considered set-up is slightly different and the result is fundamental for the methodologies developed in 
the paper, we present the proof in Appendix B. 

Any solution to a (feasible) linear program of the form ([3]) is fully determined by at most d constraints. 
This is naturally also true for the minimal 2-norm solution of a linear program. We formalize this property 
with the notion of basis. Given a collection of cutting-planes H, we say that a subset B C H is a basis 
of H if the minimal 2-norm solution to the linear program defined with the constraint set B, say z* B , is 
identical to the minimal 2-norm solution of the linear program defined with the constraint set H, say z* H , 
i.e., z* B = z* H , while for any strict subset of cutting-planes B' C B, it holds that z* B , ^ z* B . For a feasible 
problem, the cardinality of a basis is bounded by the dimension of the problem, i.e., \B\ < d. Throughout 
this paper, a basis is always considered to be a basis with respect to the 2-norm solution of the linear 
program and a basis computation requires to compute the solution to problem ([6]). Note, however, that 
the active constraints at an optimal point z* H are always a superset of a basis at this point and are exactly 
a basis if the problem is not degenerate. Therefore, in most cases it will be sufficient to find the active 
constraints, which are easy to detect. 

IV. The Cutting-Plane Consensus Algorithm 

For a network of processors, we propose and analyze the Cutting-Plane Consensus algorithm to solve 
distributed convex optimization problems of the form ([T]). 

The Distributed Cutting-Plane Consensus Algorithm 

The algorithm to solve general distributed optimization problems ([T]) is as follows. 

Cutting-Plane Consensus: Processors store and update collections of cutting -planes. The cutting- 
planes stored by agent i at iteration t are always a basis of a corresponding linear approximate 
program ([3]), and are denoted by B™(t). A processor initializes its local collection of cutting- 
planes Bq with a set of cutting-planes chosen such that Bq D and max^ s [*j c T z < oo. 
Each processor repeats then the following steps: 
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(51) it transmits its current basis B^(t) to all its out-neighbors J\fo(i,t) and receives the basis 
of its in-neighbors Y®(t) = U i6 ^(M) ^ ^ ; 

(52) it defines H^Jt) = B^(t) UY^(t), and computes (i) a query point z^(t) as the minimal 
2-norm solution to the approximate program ([3]), i.e., 

z^(t) = arg min ||z||2 

and (ii) a minimal set of active constraints Bf^ (t); 

(53) it calls the cutting-plane oracle for the constraint set Zi at the query point z^(t), 

/i(^(t)) = ORC(^(t),^); 

(54) it updates its collection of cutting planes as follows: if z^(t) G Zi then B^(t + 1) = 
Bf^ p (t), otherwise B®(t + 1) is set to the minimal basis of Bf mp (t) U h(z®(t)). 

The four steps of the algorithm can be summarized as communication (SI), computation of the query 
point (S2), generation of cutting -plane (S3) and dropping of all inactive constraints (S4). The Cutting- 
Plane Consensus algorithm is explicitly designed for the use in processor networks. We want to emphasize 
here the following four aspects of the algorithm. 

Distributed Initialization: Each processor can initialize the local constraint sets as a basis of the artificial 
constraint set {z e R d : -Ml < z < Ml} for some M > 1. If M e IR >0 is chosen sufficiently 
large, the artificial constraints will be dropped during the evolution of the algorithm. 

Bounded Communication: Each processor stores and transmits at most (d + i)d numbers at a time. In 
particular, processors exchange bases of (|3]), which are defined by not more than d cutting-planes. 
Each cutting -plane is fully defined by d + 1 numbers. 

Bounded Local Computations: Each processor has to compute locally the 2-norm solution to a linear 
program with d(\Ni(i,t)\ + 1) constraints. 

Asynchronous Communication: The Cutting-Plane Consensus algorithm does not require a time-synchronization. 
Each processor can perform its local computations at any speed and update its local state whenever 
it receives data from some of its in-neighbors. 

Due to these properties, the Cutting-Plane Consensus algorithm is particularly well suited for optimiza- 
tion in large networks of identical processors. 

Technical Analysis of the Cutting-Plane Consensus Algorithm 

Before starting the proof of the algorithm correctness, we point out three important technical properties 
related to its evolution: 

• The linear constraints stored by a processor form always a polyhedral outer-approximation of the 
globally feasible set Z. 

• The cost-function of each processor is monotonically non-increasing over the evolution of the algo- 
rithm. 

• If the communication graph Q c is a strongly connected static graph, then after diam(£ c ) communi- 
cation rounds, all processors in the network compute a query-point with a cost not worse than the 
best processor at the initial iteration. 
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These properties provide an intuition about the functionality of the algorithm and the line we will follow 
to prove its correctness. They are formalized and proven rigorously in Lemma A.l in Appendix |B} We 



are ready to establish the correctness of the Cutting-Plane Consensus algorithm. We start by formalizing 
two auxiliary results which are also interesting on their own. The first result states the convergence of the 
query points to the locally feasible sets. 



Lemma 4.1 (Convergence): Assume Assumption 3.1 holds. Let z™(t) be the query point generated by 
processor i performing the Cutting-Plane Consensus algorithm. Then, the sequence {z^(t)} t > has a limit 
point in the set Zi, i.e., there exists G Zi such that 

lim \\z [i] {t) - z [i] \\ 2 -> 0. □ 

t— >oo 

The second result shows that all processors in the network will reach an agreement. 
Lemma 4.2 (Agreement): Assume the communication network Q c (t) is jointly strongly connected. Let 
z^'(t) be query points generated by the Cutting-Plane Consensus algorithm, then 

lim \\z [i] (t) - z [j] (t)\\ 2 ->■ 0, for all i,j G {1, . . . , n}. □ 

t— >OD 

The correctness of the algorithm is summarized in the following theorem. 

Theorem 4.3 (Correctness): Let Q c (t) be a jointly strongly connected communication network with 



processors performing the Cutting-Plane Consensus algorithm, and let Assumption 3.1 hold. Let z* be 
the unique optimizer to ([T]) with minimal 2-norm, then 

lim ||z [i] (t) - z*\\ 2 ->■ for all z <E {l,...,n}. □ 

t— >oo 



are 



For the clarity of presentation, the technical proofs of Lemma 4.1, Lemma 4.2 and Theorem 4.3 
presented in Appendix |Bj 

A major advantage for using the Cutting-Plane Consensus algorithm in distributed systems is its inherent 
fault-tolerance. The requirements on the communication network are very weak and the algorithm can well 
handle disturbances in the communication like, e.g., packet-losses or delays. Additionally, the algorithm 
has an inherent tolerance against processor failures. We say that a processor fails if it stops at some time 
tf to communicate with other processors. 

Theorem 4.4 (Fault-Tolerance): Suppose that processor I fails at time tf, and that the communication 
network remains jointly strongly connected after the failure of processor /. Let z™{tf) be the last query 
point computed by processor I and define y"(tf) = c T z^(tf). Then the query-points computed by all 
processors converge, i.e., Hindoo — — > 0, with z_\ satisfying 

and c T z^i < j V] (t f ). 




4.2 



Proof: Consider the evolution of the algorithm starting at time tf. With Lemma 4.1 and Lemma 
one can conclude that for all processors i ^ I, the query points will converge to the set [f]^ Z-, 



Additionally, the out-neighbors of the failing processor I have received a basis B^'(tf) such that the 
optimal value of the linear approximate program ([3]) is 7^ (tf). Any query point (t),t > tf, subsequently 
computed by the out-neighbors of processor I as the solution to (|3]) must therefore be such that c T z^(t) < 
7 [l] (t f ) for all t > t f . M 
This last result provides directly a paradigm for the design of fault-tolerant systems. 

Corollary 4.5: Suppose that for all / G V, [X=x,i+i Z i = Z - Then for a11 / G v > *-i = z * with z * the 
optimal solution to ([T]). □ 

The abstract problem formulation ([T]) and the Cutting-Plane Consensus algorithm provide a general 
framework for distributed convex optimization. We show in the following that a variety of important 
representations of the constraint sets are covered by this set-up. Depending on the formulation of the 
local constraint sets Zi different cutting-plane oracles must be defined, leading to different realizations 
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of the algorithm. We specify in the following the Cutting-Plane Consensus algorithm to three important 
problem classes. We want to stress that the correctness proofs established here for the general set-up will 
hold directly for the three specific problem formulations discussed in the remainder of the paper. 



V. Convex Optimization with Distributed Inequality Constraints 

As first concrete setup, we consider the most natural realization of the general problem formulation ([TJ) 
with the local constraint set defined by a convex inequality, i.e., 

Zi = {z: fi(z) < 0}. (8) 

The functions fa : IR d i— > K are assumed to be convex, but not necessarily differentiable. Thus, the set-up 
([8]) includes also the case in which processor % is assigned more that one constraint, say Zi = {z : fa(z) < 
0, fi2(z) < 0, . . . , fik(z) < 0}. In fact, one can define fi(z) := max/ e {i ) „.fe} fij(z) and directly obtain the 
formulation ([8]). 

To define a cutting-plane oracle for constraints of the form ([8]), we use the concept of subdifferential. 
Given a query-point z q e R. d , the subdifferential of fa at z q is 

dfi(z q ) = { 9i e R d : fi(z) - fi(z q ) > gf(z - z q ), Wz e R d }. 

An element gi e dfi(z q ) is called a subgradient of fa at z q . If the function fa is differentiable, then its 
gradient Vfi(z q ) is a subgradient. A cutting-plane oracle for constraints of the form ([8]) is now as follows, 
see, e.g., [|24]|. 

Cutting-plane Oracle: If a query point z q is such that fi(z q ) > 0, then 

fi(z q )+gf(z-z q ) <0, (9) 

for some G dfi(z), is returned, . 
Note also that Assumption 3.1 is satisfied, since s(z q ) = fi(z q ) + gf(z q — z q ) = f(z q ), and f(z q ) = 



implies z q e Zj. If / f («) := max je{lj ... fc} fij(z), then 9/i(« g ) = Co U {dfij(z q ) : /y(^) = /»(*,)}, where 
Co denotes the convex hull. Thus, the method is applicable for constraints where subgradients can be 
obtained. 

Remark 5.1: An important class of constraints are semi-definite constraints of the form Z { = {z : 
Fi(z) := F i0 + Z\Fn + ■ ■ ■ + ZdF ic i < 0}, where F^- are real symmetric matrices, and ' < 0' denotes 
negative semi-definite. The semi-definite constraint can be formulated as inequality constraint 

fi(z) := XmUFi(z)) < 0, 

with A max the largest eigenvalue of F(z). It is discussed, e.g., in [25J, that given a query point z q and a nor- 
malized eigenvector v* of Fi{z q ) corresponding to X ma , x (Fi(z q )), then the vector g^ = [v* T Fiv q , . . . , v * T F d v*] T 
is a subgradient of fi(z). The Cutting-Plane Consensus algorithm can thus handle semi-definite constraints 
and has to be seen in the context of the recent work on cutting-plane methods for semi-definite program- 
ming flU, ED. " " □ 
The Cutting-Plane Consensus algorithm is directly applicable to problems where processors are assigned 
convex, possibly non-differentiable, inequality constraints. Such distributed problems appear in various 
important application. For example, the distributed position estimation problem in wireless sensor networks 
can be formulated in the form ([T]) with convex inequality and semi-definite constraints available only locally 
to (some of) the sensor nodes. 
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Application Example: Convex Position Estimation in Wireless Sensor Networks 

Wide-area networks of cheap sensors with wireless communication are envisioned to be key elements 
of modern infrastructure systems. In most applications, only few sensors are equipped with localization 
tools, and it is necessary to estimate the position of the other sensors, see ll28l . 

In 11291 the sensor localization problem is formulated as a convex optimization problem, which is then 
solved by a central unit using semidefinite programming. The semi-definite formulation proposed in [|29ll 
has been later extended in the literature. We formulate the distributed position estimation problem given 
in [|29ll in the general distributed convex optimization framework ([T|) and show that the general Cutting- 
Plane Consensus algorithm can be used for a fully distributed solution, using only local message passing 
between the sensors. 

Let in the following v, G IR 2 denote the known position of sensor i G {1, . . . , n}. We want to estimate 
the unknown position of an additional sensor z G IR 2 . In [|29l , two different estimation mechanisms are 
considered: (i) laser transmitters at nodes which scan through some angle, leading to a cone set, which 
can be expressed by three linear constraints of the form f(z) := afz — 6j < 0, a« G IR 2xl and b{ G IR, 
two bounding the angle and one bounding the distance and (ii) the range of the RF transmitter, leading 
to circular constraints of the form \\z — Vi|]| < r 2 . Using the Schur-complement, the quadratic constraint 
can be formulated as a semi-definite constraint of the form 



Fi{z) := (-1) 



rj/ 2 (z - v,: 



<o, 



where I 2 is the 2x2 identity matrix. Each sensor i can bound the position of the unknown sensor 





Fig. 1. Localization of the white node by set estimates of the four gray nodes. The set estimate is given by the bounding box which 
determined by the four point z x , z", z yi Zy . The four extreme points can be found with the Cutting-Plane Consensus algorithm. 

to be contained in the convex set Zi, which is, depending on the available sensing mechanism, a disk 
represented by a semi-definite constraint Z { = {z : F^{z) < 0}, a cone Z { = {z : fij(z) < 0, j = 1,2,3}, 
or a quadrant, Z, L = {z: F^z) < 0, f iS < 0, j = 1, 2, 3}. 

The sensing nodes can now compute the smallest box {z G IR 2 : [z l x ,z l x ] T < z < {z x l ,Zy] T } that 
is guaranteed to contain the unknown position using the Cutting-Plane Consensus algorithm. As pro- 
posed in [|29l , the minimal bounding box can be computed by solving four optimization problems 
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with linear objectives. To compute, for example, z% one defines the objective c x = [1,0_F and solves 
:= max c^z, s.t. z G f)™ =1 Zi. In the same way z l x ,z l y ,Zy can be determined. Figure ill illustrates a 
configuration where four nodes estimate the position of one node. A linear version of sucn a distributed 
estimation problem, i.e., with all constraints being linear inequalities, has been considered in the previous 
work 031. 



VI. Robust Optimization with Uncertain Constraints 

The general formulation ([T]) covers also distributed robust optimization problems with uncertain con- 
straints. The Cutting-Plane Consensus algorithm can therefore be used to solve a class of robust optimiza- 
tion problems in peer-to-peer processor networks. 

In particular, we consider constraint sets with parametric uncertainties of the form 

Z t = {z : fi(z, 9) < 0, for all 9 G (10) 

where 6>j is an uncertain parameter, taking values in the compact convex set We assume that fa is 
convex in z for any fixed 9. If additionally fa is concave in 9 and f2j is a convex set, we say that the 
resulting optimization problem ([T]) is convex [1301 . As we will see later on, the first condition is cruicial 
for the application of the algorithm. The second condition will ensure that the problem can be solved 
exactly by our algorithm. 

The problem ([T]) with constraints of the form ( flO] ) is a distributed deterministic robust OH or distributed 



semi-infinite optimization problem H301 . Each processor has knowledge of an infinite number of constraints, 



determined by the parameter 9 and the uncertainty set Qj. Obviously, uncertain constraints as ( fTO] ) appear 
frequently in distributed decision problems. Here we focus on a deterministic worst-case optimization 
problem, where a solution that is feasible for any possible representation of the uncertainty is sought. A 
comprehensive theory for robust optimization in centralized systems has been developed and is presented, 
e.g., in ED. 

Nowadays, mainly two different approaches are pursued in robust optimization. In one research direction 
infinite, uncertain constraints are replaced by a finite number of "sampled" constraints. Sampling methods 
select a finite number of parameter values and provide bounds for the expected violation of the uncertain 
constraints [[32l . In a distributed setup, a sampling approach has been explored in ll33~l . The other 



research direction aims at formulating robust counterparts of the uncertain constraints ( fTO] ), leading 
often to nominal semi-definite problems (see, e.g., [31]). Handling the uncertain constraint from a semi- 
infinite optimization point of view ( fTO) , allows also to apply exchange methods 11341 . where the sampling 
point is chosen as the solution of a finite approximation of the optimization problem. Recently, cutting- 
plane methods have been considered in the context of centralized robust optimization |[35ll . Robust 
optimization in processor networks is a relatively new problem. Robust optimization for communication 
networks using dual decomposition is considered in [|36l . We connect the robust optimization problem with 



uncertain constraints ( [10] ) to our general distributed optimization framework, and show that the Cutting- 
Plane Consensus algorithm can solve the problem in processor networks. In fact, the novel Cutting- 
Plane Consensus algorithm is related to the exchange and cutting-plane methods (341, 11351 . We define the 



cutting -plane oracle for the distributed robust optimization problem ( fTO) as follows. 

Pessimizing Cutting-Plane Oracle: Given a query point z q , the worst-case parameter value 9* 
is the maximizer of the optimization problem 

max fi(z q ,9) s.t. 9 G Q { . (11) 



The query point z q is contained in Zi if and only if the value of ([TT]) is smaller or equal to 
zero. If z q tfi Zi, then cutting-plane is generated as 

fi(z q ,9* q )+gj(z-z q )<0 (12) 

where gf G dfi(z q , 9*) is a subgradient of fa. 
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To see that ( fT2] ) is a cutting -plane, note that a query point z q £ Zi is cut off, since fi(z q , 9*)+gf(z q — z, 



qj 



fi(z q , 9*) > 0. Additionally, for any point z £ Zi, we have > fi(z i} 9) for all 9 £ and in particular 



> ^) > ^) + gj (z — z q ). Note that Assumption 3.1 is satisfied since fi(z q , 9*) = implies 



that z q £ Zi. 

The oracle of the robust optimization problem requires to solve an additional optimization problem 
for determining the worst case parameter ( fTT) . Following 11331 , we call this the pessimizing step. For the 
practical applicability of our algorithm it is important to stress that the pessimizing steps are performed 
in parallel on different processors. 

The pessimizing step can in general be performed by numerical tools. It can be solved exactly if the 
problem is convex, i.e., fa is concave in the uncertain parameter. 

However, even if the convexity condition is not satisfied it might still be possible to find an exact 
solution. Reference ll331l provides a formal discussion about when the pessimizing step can be solved 
exactly or even analytically. We review here parts of the discussion. Assume, e.g., that /j is convex in 
9i for all z, and fij is a bounded polyhedron, with the extreme points {9}, . . . , 9f}. The maximum of 
fi(z,u) is then the maximum of fi(z,9\), . . . , fi(z,9f), and ( fTT| ) can be solved exactly by evaluating 
and comparing a finite number of functions. Furthermore, if fi(z,9i) is an affine function in if i.e., 
fi(z, 9i) = cti(z)9i + (3i(z) and the uncertainty set is an ellipsoid, i.e., fij = {9 : 9 = 9~i + PiU, \\u\\2 < 1} 
for some nominal parameter value 9i and a positive definite matrix Pi, then the worst-case parameter 
value can be computed analytically as 

9i = di + nf?M\ ■ (13) 

Finally, if fa is affine in the uncertain parameter and the uncertainty set is a polyhedron, the pessimizing 



step (11) becomes a linear program. 



Computational Study: Robust Linear Programming 

We evaluate in the following the time complexity of the algorithm in a computational study for 
distributed robust linear programming. We follow here lf37ll and consider robust linear programs in the 



form ( [TO] ) with linear uncertain constraints 

ajz<bi, di £ Ai, ie{l,...,n}. (14) 

The data of the constraints is only known to be contained in a set, i.e., a { £ A4. Although our algorithm 
can in principle handle any convex uncertainty set Ai, we restrict us for this computational study to 
the important class of ellipsoidal uncertainties Ai = {a» : ai = a>i + PiUi, 1 1 1** ) 1 2 — !}■ The uncertainty 
ellipsoids are centered at the points di and their shapes are determined by the matrices Pj £ R. dxd . It is 
known in the literature that the centralized problem can be solved as a nonlinear conic quadratic program 

max c T z, s.t. aj z + \\Piz\\2 < h, i £ {1, . . . , n}. (15) 

We will apply our algorithm directly to the uncertain problem model and use the nonlinear problem 
formulation (jT5j) only as a reference for the computational study. For the particular problem ( |T4| ) the 



pessimizing step can be performed analytically. Note that sup ajg _ 4i ajz q = afz q + sup|| u || 2<1 {M T P i T 2; ? } 
afz q + \\P^z q \\ 2 . The worst-case parameter is therefore given by 

PP T z 

< = a* + (16) 



A cutting-plane defined according to ( |I2| ) takes simply the form a*z < bi, i.e., the linear constraint with 
the worst case parameter value. 
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For the computational study, we generate random linear programs in the following way. The nominal 
problem data a { e M. d and c £ M. d are independently drawn from a Gaussian distribution with mean 



and standard deviation 10. The coefficients of the vector b are then computed as 6j 



(af a, 



1/2 



This 



random linear program model has been originally proposed in [38]. The matrices P, ; are generated as 



Pi = MjMi with the coefficients of e 



sdxd 



chosen randomly according to a normal distribution 



with mean and standard deviation 1. All simulations are done with dimension d = 10. We consider the 
number of communication rounds required until the query points of all processors are close to the optimal 
solution z*, i.e., we stop the algorithm centrally if for all i G V, — z*\\ 2 < 0.1. In Figure 2} the 

completion time for two different communication graphs is illustrated. We compare random Erdos-Renyi 
graphs, with edge probability p = 1.2 log ^ n \ and circulant graphs with 5 out-neighbors for each processor. 
It can be seen in Figure [2] that the number of communication rounds grows with the network size for the 
circulant graph, which have a growing diameter, but remains almost constant for the random Erdos-Reny 
graphs, which have always a small diameter. The simulations suggests, that the completion time depends 
primarily on the diameter of the communication graph. 
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Fig. 2. Average number of communication rounds and 95% confidence interval required to compute the optimal solution to randomly 
generated robust linear programs with a precision of e < 0.1 for Erdos-Renyi graphs (blue) and circulant graphs (red) with the Cutting- 
Plane Consensus (CPC) algorithm. The dashed line shows for comparison the number of iterations of the ADMM algorithm with dual 
decomposition (dashed line). 



We consider for a comparison the ADMM algorithm combined with a dual-decomposition, as described, 
e.g., in IIT31 pp. 48], to solve the nominal conic quadratic problem representation ( fT5] ) of the robust 
optimization problem^ In one iteration of the ADMM algorithm, all processors must update their local 
variables synchronously and then compute the average of all decision variables. Figure [2] (right axis) shows 
the number of iterations of the ADMM to compute the solution to the random linear programs with the 
same precision as the Cutting-Plane Consensus algorithm. Note that the ADMM algorithm requires almost 
three times more iterations than the Cutting-Plane Consensus algorithm requires communication rounds. 
Note also that the ADMM algorithm requires for each iteration an averaging of the local solutions, 
which can be done by a consensus algorithm. Taking into account that the number of communication 
rounds required to compute an average by a consensus algorithm is lower bounded by Q (n 2 log(|)), 
where 5 is the desired precision 11391 , it is obvious that processors running the ADMM algorithm need 

'We use in the simulations a step-size p — 200, see 1131 Chapter 7] for the notation. Please note that the choice of the step-size of the 
ADMM method has to be done heuristically. We have selected the step-size as the best step-size we found experimentally for the smallest 
problem scenario n = 20. Although the convergence speed of the ADMM method might improve with another step-size, in our experience 
most heuristic choices led to a significant deterioration of the performance. 
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to communicate significantly more often than processors running the Cutting-Plane Consensus algorithm. 
Although the simulations do not compare the time-complexity of the algorithms in terms of computation 
units, they clearly suggest that the Cutting-Plane Consensus algorithm is advantageous for applications 
where communication is costly or time consuming. 



VII. Separable Cost Optimization with Distributed Column Generation 

The general convex problem set-up ([T]) covers also the very important class of almost separable 
optimization problems, i.e., problems where each processor is assigned local decision variables with a 
local objective function and the local variables are coupled by a coupling constraint. We sketch here the 
application of the Cutting-Plane Consensus algorithm to convex problems with separable costs and linear 
coupling constraints of the form 



n 

mil) V JAXi 



i=i 

n 

t. ^^Gj^Ej = h, Xi G X, 



l= n (IV) 



i=l 

where G M. mi is the decision vector assigned to processor i, fa : M. mi H- K is a convex objective function 
processor i aims to minimize, and X, t C IR mi is a convex set, defining the feasible region for the decision 
vector Xi. For the clarity of presentation, we assume here that all sets Xi are bounded, although this 
assumption can be relaxed. The local decision variables Xi are all coupled by a linear separable constraint 
with a right-hand side vector h e R r . The coupling linear constraint is of dimension r, and we assume 
here that r is small compared to the number of decision variables, i.e., r <C Y^i=i m «- 



The problem formulation ( |T7] ) is the standard formulation considered for large scale optimization with 
decomposition methods 0. Standard large-scale optimization methods for ( fTTj ) exploit the separable 
structure of the dual problem, and define a coordinating master program and several sub-problems, leading 
to a structure as shown in Figure (3fa). In contrast, we are seeking an optimization method without a 
master problem using only asynchronous message-passing between neighboring processors, as visualized 
in Figure [3jb). 

The method we propose here is strongly related to the classical Dantzig-Wolfe (DW) decomposition or 
column generation Ifl9ll , O. The DW decomposition is dual to the cutting-plane method, see e.g., [|20l . 
We exploit this duality relation here. Once again we want to stress that the DW decomposition requires 
a coordinating master problem, which is not required for our algorithm. In fTTVll we proposed a similar 
algorithm for purely linear programs taking only the primal perspective on the problem. 

The problem ( [T7] ) can be formulated in the general framework ([!]), when its dual is considered. Let 



7r G W be the dual variable corresponding to the coupling constraint. The dual problem to ( [17] ) can then 
be written as 



n , 

max — h T 7T + N < min fi(x{) + n T GiX 

i=l v 



One can now define a new variable u { := min Xig ^ fi(xi)+7i T GiXi, leading to the alternative representation 
of the dual as 



n 



max — h T 7r + > Ui 

f-f (18) 



(7T,u) G {(n,u) : m < fi{xi) + ir T GiXi, Vxi G X { }. 
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(a) Structure of the classical Dantzig-Wolfe decomposi- 
tion. 



(b) Structure of the Cutting-Plane Consensus algorithm for 
separable problems. 



Fig. 3. Comparison of the classical master / subproblem structure of the DW decomposition and the peer-to-peer structure of the Cutting- 
Plane Consensus algorithm. 



This problem is explicitly in the form ([T|) with z = [n T , m, . . . , u n ] T G W +n , c = [— h T , 1^] T and 
Zi := {(it, Ui) : Ui < fi(xi) +7i T GiXi,Wxi G Xi}. The cutting-plane oracle can now be defined as follows. 
A query point is denoted as z q = [7rJ, u qt i, . . . ,u qtTl } T an d is contained in the set Zi if and only if 

Uq,i < fi(Xi) + KgGiXi, VX; G X { . 

Constraint Generating Oracle: Let x^ denote the optimal solution vector to 

min fi(xi) + TT^GiXi, s.t. x t G Xi (19) 



and let 7* be the optimal value of ( |T9| ). If u q ^ > 7* then z q ^ -Zj. A cutting plane separating z q 
and Zi is then 

Ui-fi(xi)-7r T GiXi <0. (20) 

Clearly, u q>i - fi(x 



Also, Assumption 



3.1 



- njGiXi > for (7r 9 , w g ) ^ ^ and u g)i - - n^A^i < for all (n, u) G 
holds since s(z q ) = u qji — fi(xi) — n^GiXi and s(z q ) — > implies (ir q ,u q ) G 2^. 



The proposed procedure of constructing a constraint is known as "constraint generation" or, taking the 
primal perspective, as "column generation". We name ( fT9] ) the local subproblem SPi, since it corresponds 
to the subproblem of the DW decomposition. The approximate linear program formed by each processor 
is called here local master problem MP if since it is a local version of the master program of the DW- 
decomposition. 

It is worth noting that here z = [n T , iti, . . . , u n ] T and thus the dimension of the problem, d = r + n, 
is no longer independent of the number of processors. Additionally, the set-up considered in this section 
requires a unique identifier to be assigned to each processor. These two additional restrictions have to be 
taken into account for an implementation of the algorithm. 

The Cutting-Plane Consensus algorithm is applied here to the dual problem and will compute the dual 
solution to ( [17] ), i.e., 

lim ||vr [il (t) -7r*|| 2 -+0. 

t— >oo 

If all fi(-) in ( fTTj ) are strictly convex, the solutions of the local subproblems ( [19] ) of each processor will 
converge to the optimal solution, i.e., lim^oo — x*\\ — > 0, for all i G V, where x* = [x* l} . . . is 



the optimal primal solution to ( fT7] ), and x\ (t) is the solution to ( p~9| ) computed by processor i at time t. 
However, this is not true if some /j(-) are only convex but not strongly convex. Then recovering a primal 
optimal solution from the dual solution can be done using the method known from DW decomposition. 
We assume that each processor stores the points at which a constraint is generated, x^(r), where the index 
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i indicates which processor computed at time r the point £j(r) as solution to ( p~9| ). Define GV := GjXj(r) 



and f iT := fi(xi(r)). The scalar inequalities of the approximate linear program are all of the form 

Ui - GJ T 1T < f ir . (21) 

One can now formulate the linear programming dual to the approximate program ([3]). Let A JT G IR>o be 
the Lagrange multiplier to the constraint pi) , the linear programming dual to Q is a linear program with 
the following structure: 



A ir >0 . 

1=1 T 



min ^2^2f iT \ iT 

(22) 



i=l r 



We assume in the following that all processors have the same set of constraints pi) as their basis. 
Please note that this can be achieved by halting the algorithm at some time and running a suitable 
agreement mechanism, such as the one proposed in lfT5l . A processor can now reconstruct its component 
of the solution vector as the convex combination x* = 52 T %i( T )Kri where A* r solves ( f22| ). The resulting 
solution vector x* = [x{, ■ ■ ■ ,x* n ] is globally feasible since 521=1 52t G irKr = 521=1 G i (52 T ^i( T )Kr) = 
521=1 GiX* = h. Additionally, if all processors have computed the globally optimal solution to ( [T8] ), 
then the recovered x* = 52k^i{ T )^ ls a ^ so me optimal primal solution to ( [T7| ). To see this note that 



strong duality implies that the optimal value of (22) is equivalent to the value of the linear approximate 



problem ([3]), which we denote with /*. Thus, /* = 52™=i 52 =i fi( x i( r ))Kr- Convexity of /j(-) and 
£ T A JT = 1 implies that /* = E^iErWR > 52l 1 fi(52 T Hr)K) =■ 521=ih&)- Since 
x* = [rr*, . . . , x*] is a feasible solution it must hold that 521=1 fi( x i) = /*• Please note that the proposed 



method requires each processor to store its own local solutions x\ (t) to ( fT9] ) generated during the evolution 
of the algorithm, but does not require that the processors exchange those solutions. For a more explicit 
discussion on the reconstruction of the feasible solution, we refer the reader to the literature on nonlinear 
DW-decomposition [3] or our recent paper IfTTTl . 



Application Example: Distributed Microgrid Control 

The previous discussion shows that the Cutting-Plane Consensus algorithm is applicable for many 
important control problems, such as for example distributed microgrid control. Microgrids are local 
collections of distributed energy sources, energy storage devices and controllable loads. Most existing 
control strategies still use a central controller to optimize the operation 11401 , while for several reasons, 
detailed, e.g., in [|40l . distributed control strategies, which do not require to collect all data at a central 
coordinator, are desirable. 

We consider the following optimization model of the microgrid, described recently in (|4T|. A microgrid 
consists of several generators, controllable loads, storage devices and a connection to the main grid over 
which power can be bought or sold. In the following, we use the notational convention that energy 
generation corresponds to positive variables, while energy consumption corresponds to negative variables. 
A generator generates power p gen (t),t E [0,T] within the absolute bounds pit) < p ge n(t) < p(t) and 
the rate constraints r(t) < p ge n(t + 1) — p g en{t) < f(t). The cost to produce power by a generator is 
modeled as a quadratic function f gen (t) = ap gen (t) + /3p 2 gen (t). A storage device can store or release 
power p s t{t),t G [0,T] within the bounds — d st < p st (t) < c st . The charge level of the storage device 
is then q st (t) = q s t,init + 52 T =oPA T ) and must be maintained between < q st (t) < q max . Note that 
p s t(t) takes negative values if the storage device is charged and positive values if it is discharged. A 
controllable load has a desired load profile la(t) and incorporates a cost if the load is not satisfied, i.e., 
fd(t) = a(l c i(t) — p d (t)) + , where (z) + = max{0,2;}. Finally, the microgrid has a single control unit, 
which coordinates the connection to the main grid and can trade energy. The maximal energy that can be 
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-C T p tr 



+ l T \Ptr\ where c is the 



traded is \p tr \ < E. The cost to sell or buy energy is modeled as f tr 
price vector and 7 is a general transaction cost. 

The power demand D(t) in the microgrid is predicted over a horizon T. The control objective is to 
minimize the cost of power generation while satisfying the overall demand. This control problem can be 
directly formulated as in the form ( [17] ), with the local objective functions fa = YmLq me right-hand 
side vector of the coupling constraint as the predicted demand h = [D(l), . . . , D(T)] T and Xi as the local 
constraints of each unit. 

The Cutting-Plane Consensus algorithm can solve this problem in a distributed way. Note that the 
objective functions fa considered here are all convex, but not strictly convex. If all objective functions 
were strictly convex, one could use the distributed Newtons method flU, which has locally a quadratic 
convergence rate. However, the distributed Newton method does not apply to this problem formulation. 
The Cutting-Plane Consensus algorithm does not require strict convexity of the cost functions. 

We present simulation results for an example set-up with n = 101 decision units, i.e., 60 generators, 20 
storage devices, 20 controllable loads and one connection to the main grid. A random demand is predicted 
for 15 minute time intervals over a horizon of three hours, based on a constant off-set, a sinusoidal growth 
and a random component. The algorithm is initialized with each processor computing a basis out of the 
box-constraint set {z : — 10 5 • 1 < z < 10 5 ■ 1}, leading to a very high initial objective value. Figure p] 
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Trajectories of the scaled maximal optimal value of the linear approximate programs for different fc-regular communication graphs 



shows the largest objective value over all processors, relative to the best solution found as the algorithm is 
continued to perform. The evolution of the objective value is shown for three different A; -regular graphs. 
It can be clearly seen that the convergence speed depends strongly on the structure of the communication 
graph. The convergence for a network with a 2-regular communication structure is significantly slower than 
for a network with a higher regular graph. We also want to emphasize the observation that the difference 
in the convergence speed between k = 8 and k = 32 is not as big as the increased communication 
would let one expect. This shows that the improvement obtained from more communication between the 
processors becomes smaller with more communication. A good performance of the algorithm can also 
be obtained with little communication between the processors. Please note that for all communication 
graphs the Cutting-Plane Consensus algorithm requires only few communication rounds to converge to a 
fairly good solution. Although the convergence to an exact optimal solution might take more iterations, a 
good sub-optimal solution can be found after very few communication rounds. This property makes the 
Cutting-Plane Consensus attractive for control and decision applications. 
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VIII. Discussion and Conclusions 

We proposed a framework for distributed convex and robust optimization using a polyhedral approxi- 
mation method. As a general problem formulation, we consider problems where convex constraint sets are 
distributed to processors, and the processors have to compute the optimizer of a linear objective function 
over the intersection of the constraint sets. We proposed the novel Cutting-Plane Consensus algorithm 
as an asynchronous algorithm performing in peer-to-peer networks. The algorithm is well scalable to 
large networks in the sense that the amount of data each processor has to store and process is small and 
independent of the network size. 

The appealing property of the considered outer-approximation method lies in the fact that it imposes 
very little requirements on the structure of the constraint sets. Merely the only requirement is that a cutting- 
plane oracle exists. We have presented oracles for various formulations of the constraint sets, in particular, 
inequality and convex uncertain or semi-infinite constraints. Also, we showed that, as the dual problem 
formulation is considered, also almost separable convex optimization problems can be formulated in the 
proposed framework. We showed for each of the proposed problem formulations how the cutting-plane 
oracle can be defined. 

Finally, we illustrated that the proposed set-up is of interest for various decision and control problems. 
These include the localization problem in sensor networks. They include also less obvious problems 
as, e.g., distributed microgrid control, where the novel algorithm can be applied to the dual problem 
formulation. In this context we showed that the application of the algorithm to the dual problem has the 
major advantage that a feasible solution can be found in a fully distributed way even before the algorithm 
has converged to an optimal solution. 

Appendix 



A. Proofs of Section III 



1) Proof of Proposition 3.3 : The minimal 2-norm solution is the solution to 



X 

min -z T z, s.t. A T H z < b H , A H y = c, c T z - b T H y = 0, y > 0, (23) 

where the constraints represent the linear programming optimality conditions (KKT-conditions). The 
Lagrangian of ( |23~] ) can be directly determined to be 

£{z, y, u, I, a) = X -z T z + u T (A T H z - b H ) + l T {A H y - c) + a{c T z - b T H y), y,u>0. (24) 

It follows now that y* = arg min^o C(z, y, u, I, a) = if A T H l— abn > 0. From z* = arg min z , C(z, y,u,l, a) 
follows that z* = —A H u—ac. The problem (|6]) stated in the proposition is now min n > 0i z iQ —C(z*, y*,u, I, a). 



2) Proof of Lemma 3.4- The minimal 2-norm solution z* H is the unique minimizer of 

min -||z|| 2 , s.t. c T z > ^h, AjjZ < bn- 
z 2 



and satisfies therefore the feasibility conditions c T ' z* H = and Al^z* h < bn- Since z* H is an optimal 

\M\ 

solution, there exist multipliers fi* G K and A* G K> , such that the KKT conditions are satisfied, i.e., 

z* H - fi*c + A H \* = (25) 
X* T A T H z* H - X* T b H = 0. (26) 

Since z* H is also a solution to the original linear program (|3]), there also exists a multiplier vector y* G M> 
satisfying the linear programming optimality conditions 

-c + A H y* = (27) 
y* T A T H z* H - b T H y* = 0. (28) 
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We have to show now that the existence of Zjj,fi*,\* and y* imply, for a sufficiently small e, the 
existence of a multiplier vector n* satisfying the optimality conditions of Q, which are 



-c + ez* H + A h tt* = and ti* t A T H z* H - n*b H = 



(29) 



We distinguish now the two cases //* > and /i* = 0. First, assume // > 0. We can multiply ( [27] ) with 
4r, for arbitrary t G (0, 1], and add to this ( [27] ), multiplied by (1 — t) to obtain 



^z* H -c + A H (^\* + (l-t)y*) 



The same steps can be repeated with (26) and (28) to obtain 



'L 



\*+ +(l-t)y* 1 )(A 1 H z* H -b H 



0. 



0. 



(30) 



(31) 



With (j30j) and (|5T), for any e < one can define t e = e/2*. Then vr* = -^A* T + (1 - t e )y* T solves 
In the second case /x* = 0, one can pick an arbitrary e > 0, multiply ( [25^ (and ( [26) , respectively) with e 
and add ([27]) (or ([2|]), respectively) to obtain ez* H -c + A H (e\* + y*) = and (eA* +y*)(A T H z* H - b H ) = 0. 
Now, vr* := (eA* + y*) solves ([25]). ■ 



B. Proofs of Section IV: Correctness of the Algorithm 

Some technical properties of the algorithm are formalized in the following result. 
Lemma A.l: Let z^(t) be the query point and B^{t) the corresponding basis. Let B^(t) C W 1 be the 
feasible set induced by B^(t). Then, 

(i) B^(t) D Z for all % G {1 . . . ,n} and t > 0; 

(ii) lim^oo z^(t) = z and z E Z implies Z is a minimizer of ([T]); 

(iii) there exists e > such that for all i e {1, . . . , n} and all t > 0, the query points maximize 
the objective function 

J e (z) := c T z - -\\z\\l 

over the set of constraints B®(t) U Y^(t) (as defined in (S2)) for all e G [0,e]; 

(iv) Jt(z®{t + 1)) < Je(z^(t)) for all i G {1, . . . , n} and all t > 0; 

(v) if Q c is a strongly connected static graph, then Jj^z^it + diam((? c ))) < J^(z^(t)) for all i, j G 
{1, . . . ,n} and all t > 0. 

Proof: To see (i), note that any cut hk generated by the oracle of processor i, ORC(-,Zj) is such 
that the half-space h k contains Z,i, and in particular h k contains Z = HILi Thus any collection of 
cuts H = \J k hk, generated by arbitrary processors is such that "H D DlLi = 2> an d in particular 
B^(t) D The claim (ii) follows since z™(t) is computed as a maximizer of the linear cost c T z over the 
collection of cutting-planes i/|^ p (t). The induced polyhedron is such that 7/^, p (t) D Therefore, we 
can conclude that c T z^(t) > c T z*, where z* is an optimizer of ([T]). By continuity of the linear objective 
function, we have that c T z > c T z* On the other hand, c T z < c T z* for all z G Z. This proves the statement. 
The statement (iii) follows from Lemma 3.4 For any approximate program defined by processor i at time 
t, there exists a constant e it > such that (fj i s the unique maximizer of the family of strictly concave 
objective functions J e (z) := c T z — §||.2|| 2 , e G [0, €#], over the set of constraints B^(t) U Y^(t). One 
can now always find e > such that e < e it for all z 6 {1, ... , n} and t > 0. To see claim (iv), note that 
adding cutting -planes, either by receiving them from neighbors (S2) or by generating them with the oracle 
(S3), can only decrease the value of the strictly concave objective function J £ (-) and the basis computation 
in (S4) keeps, by its definition, the value of J £ (-) constant. Finally, (v) can be seen as follows. Starting at 
any time t at some processor i, at time t + 1 all processors in / G t) received the basis of processor 
i, and compute a query point that satisfies J € (z®(t + 1)) < J^(z^(t)) for all I G J\fj(i,t). This argument 
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can be repeatedly applied to see that, in the static, strongly connected communication graph Q c , at least 
after diam(C/ c ) iterations, all processors in the network have an objective value smaller than J e (z^(t)). ■ 



Next we present the proof of Lemma 4.1 Lemma 4.2, and Theorem 4.3 The following proofs use the 



parameterized cost function Je(-). However, for the clarity of presentation we will simplify our notation 
in the following proo fs an d write simply J(-) instead of Je(-). 



1 ) Proof of Lemma 4.1 ■ All z^ (t) are computed as maximizers of the common strictly concave objective 



function J(-) (Lemma A.l (iii)) and J(-) is m onotonically non-increasing over the sequence of query 



points computed by a processor (Lemma A.l (iv)). Any sequence { J(z^(t))} t > , i G {l,...,n}, has 
therefore a limit point, i.e., lim^oo J(zWjTj) — > JW. Since the sequence is convergent, it holds that 
lim^oo (J(z®(t)) - J(z®(t + 1))) 0. By strict concavity of J(-) follows that J(z®(t)) - J(z®(t + 
1)) > <y\\z^{t) - z®{t + 1)||| for some a > 0. Consequently, lim^ ||z w (t) - z®(t + 1)|| 2 -> and 
the sequence of query points has a limit point, i.e., Hindoo — ^ lb — > 0. Suppose now, to get 

a contradiction, that z^ Zi. Then there exists 5 > such that all z satisfying \\z — zM|| 2 < 8 
are not contained in Z^. Since lim^^ ||z^(i) — z^|| 2 — > 0, there exists a time instant T$ such that 
\\z®(t) - z [i] || 2 < 5 for all t > T 5 , and thus z®(t) <£ Z { for t > T s . But now, for all t > T s the 
oracle OKC(z^(t), Zi) will generate a cutting-plane according to ([2]), cutting off zM(t). According to 
©, it must hold that a T (z®(t))z®{t) - b(z®) = s{z®(t)) > and a T {z^(t))z^{t + 1) - < 0. 

This implies that a T (z [ ^(t)) (« H (t) - z [i] (t + 1)) > and consequently ||z M (t) - z®(t + 1)|| 2 > 



Ja(3 W (t))||2) _1 s(* W (*))- By Assumption |3.l|(i) holds ||a(>M(t))|| 2 < oo and thus lim^ s(z®(t)) 0. 
As a consequence of Ass. |3.l| (ii) follows directly that zM G Z^ providing the contradiction. ■ 



2) Proof of Lemma p£2} v Let JW := J(zM) be the objective value of the limit point z^ f the sequence 
{zW(t)} t > computed by processor i. We show first that the limiting objective values JW are identical 
for all processors. Suppose by contradiction that there exist two processors, say i and j, such that jW < 
Jbl. Pick now 5 > such that jbl - jM > S . The sequences { J(z^(t))} t >o and {</(z^(0)}*>ci ar e 
monotonically increasing and convergent. Thus, for every S > there exists a time T5 such that for all 
t > T s , J{z [i] {t)) - JM < S and J(z [i] (t)) - J [i] < 5. This implies that there exists T So such that for all 
t ^> TJ5 

J(z^(t)) < 5 + jW < J [J] . 

Additionally, since the objective functions are non-increasing, it follows that for any time instant t' > 0, 

J{z m {t')) > Jbl. Thus, for all t > T So and all t' > 0, 

J(z»(*)) < J(«W(0). (32) 

Pick now t > T$ . For all r > define now an index set I T as follows: Set Jo = {i} and for 
any r > define I T by adding to 7 T _i all indices for which there exist some / G 7 T _i such that 
G -E(to + r )- Since, by assumption (?£°(io) is strongly connected, the set I T will eventually include 
all indices 1, . . . ,n, and in particular there is r* such that j G I T *. The algorithm is such that for all 

/ G I T , J(z®(t + r)) < J(z^(t )) and thus 

J(z y] (t + O)< J(z [i] (to)). (33) 

But ([33]) contradicts ([32]), proving that J [i] = J [2] = • • • = JH =: J. Thus, it must hold that for all 
i,j G {1, ...,n}, lim^oo I J(zW(t)) — J(z J (t))| — >• 0. From the strict concavity of J(-) follows that 
|J(zW(t))- J(z0(t))| > cr||zW(t)-z^(t)||l, for some a > 0. Therefore, lim^ ||zM(t) -zM(t)|| 2 ^ oo ; 
which proves the theorem. ■ 
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3) Proof of Theorem 4.3 



It follows from Lemma 4.2 that the query points of all processors conv erge 
to the same query point, i.e., fM = z for all processors i. Now, we can conclude from Lemma 4.1 that 
z E Zi for all i and thus z 6 Z. It follows now from Lemma A.l part (ii), that z is an optimal solution to 
([T]). It remains to show that z is the optimal solution with minimal 2-norm. Let z* be the optimal solution 
with minimal 2-norm. Then there exists an e > such that the parameterized objective function satisfies 
J e (z*) > J € (z) for all z 6 Z and J € (z^(t)) > J e {z*) for all t. With the same argumentation used for 
Lemma A.l part (ii), we conclude that z is the unique solution maximizing J e (-) over Z, i.e., z is the 
optimal solution to ([T]) with minimal 2-norm. ■ 



References 

[1] F. Bullo, J. Cortes, and S. Martinez, Distributed Control of Robotic Networks, ser. Applied Mathematics Series. Princeton University 

Press, 2009. [Online]. Available: http://coordinationbook.info 
[2] D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Computations: Numerical Methods. Belmont, Massachusetts: Athena 

Scientific, 1997. 

[3] L. S. Lasdon, Optimization Theory for Large Scale Systems. Courier Dover Publications, 2002. 

[4] J. Tsitsiklis, D. Bertsekas, and M. Athans, "Distributed asynchronous deterministic and stochastic gradient optimization algorithms," 

IEEE Transactions on Automatic Control, vol. 31, no. 9, pp. 803-812, 1986. 
[5] A. Nedic and A. Ozdaglar, "Distributed subgradient methods for multi-agent optimization," IEEE Transactions on Automatic Control, 

vol. 54, no. 1, pp. 48-61, 2009. 

[6] S. H. Low and D. E. Lapsley, "Optimization flow control - i: Basic algorithm and convergence," IEEE/ACM Transactions on Networking, 
vol. 7, no. 6, pp. 861-874, 1999. 

[7] S. H. Low, F. Paganini, and J. C. Doyle, "Internet congestion control," IEEE Control Systems Magazine, vol. 22, no. 1, pp. 28 - 43, 
2002. 

[8] A. Nedic, A. Ozdaglar, and P. A. Parrilo, "Constrained consensus and optimization in multi-agent networks," IEEE Transactions on 

Automatic Control, vol. 55, no. 4, pp. 922-938, 2010. 
[9] M. Zargham, A. Ribeiro, A. Jadbabaie, and A. Ozdaglar, "Accelerated dual descent for network optimization," IEEE Transactions on 

Automatic Control, 2011, submitted (Nov. 2011). 
[10] M. Zargham, A. Ribeiro, A. Ozdaglar, and A. Jadbabaie, "Accelerated dual descent for network optimization," in Proc. American 

Control Conference, San Francisco, CA, June 2011, pp. 266-2668. 
[11] F. Zanella, D. Varagnolo, A. Cenedese, P. Gianluigi, and L. Scenato, "Newton-raphson consensus for distributed convex optimization," 

in Proc. 50th IEEE Conf. on Decision and Control, Orlando, Florida, December 2011, pp. 5917 - 5922. 
[12] I. D. Schizas, A. Ribeiro, and G. B. Giannakis, "Consensus in ad hoc WSNs with noisy links - part I: Distributed estimation of 

deterministic signals," IEEE Transactions on Signal Processing, vol. 56, no. 1, pp. 350 - 364, 2008. 
[13] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, "Distributed optimization and statistical learning via alternating direction 

method of multipliers," Foundations and Trends in Machine Learning, vol. 3, no. 1, pp. 1 - 122, 2010. 
[14] G. Notarstefano and F. Bullo, "Network abstract linear programming with application to minimum-time formation control," in IEEE 

Conference on Decision and Control, New Orleans, USA, 2007, pp. 927-932. 
[15] , "Distributed abstract optimization via constraints consensus: Theory and applications," IEEE Transactions on Automatic Control, 

vol. 56, no. 10, pp. 2247-2261, 2011. 
[16] M. Burger, G. Notarstefano, F. Bullo, and F. Allgower, "A distributed simplex algorithm for degenerate linear programs and multi-agent 

assignments," Automatica, vol. 48, no. 9, pp. 2298 - 2304, 2012. 
[17] M. Burger, G. Notarstefano, and F. Allgower, "Locally constrained decision making via two-stage distributed simplex," in Proc. IEEE 

Conference on Decision and Control, European Control Conference, Orlando, Dec. 2011, pp. 5911 - 5916. 
[18] , "Distributed robust optimization via cutting-plane consensus," in Proc. IEEE Conference on Decision and Control, Maui, Hawaii, 

Dec. 2012. 

[19] G. Dantzig and P. Wolfe, "The decomposition algorithm for linear programs," Econometrica, vol. 29, no. 4, pp. 767 - 778, 1961. 
[20] B. C. Eaves and W. I. Zangwill, "Generalized cutting plane algorithms," SIAM Journal of Control and Optimization, vol. 9, no. 4, pp. 
529 - 542, 1971. 

[21] O. L. Mangasarian, "Least-norm linear programming solution as an unconstrained optimization problem," Journal of Mathematical 

Analysis and Applications, vol. 92, pp. 240 - 251, 1983. 
[22] Y.-B. Zhao and D. Li, "Locating the least 2-norm solution of linear programs via a path-following method," SIAM Journal on 

Optimization, vol. 12, no. 4, pp. 893 - 912, 2002. 
[23] O. L. Mangasarian and R. R. Meyer, "Nonlinear perturbation of linear programs," SIAM Journal on Optimization, vol. 17, no. 6, pp. 

745 - 752, 1979. 

[24] J. E. Kelley, "The cutting plane method for solving convex programs," SIAM Journal on Applied Mathematics, vol. 8, pp. 703 - 712, 
1960. 

[25] C. Scherer and S. Weiland, "Linear matrix inequalities in control," Delft Center for Systems and Control, Delft University of Technology, 

The Netherlands, Tech. Rep., 2004. 
[26] K. Krishnan and J. Mitchell, "A unifying framework for several cutting plane methods for semidefinite programming," Optimization 

Methods and Software, vol. 21, pp. 57 - 74, 2006. 



21 



[27] H. Konno, N. Kawadai, and H. Tuy, "Cutting-plane algorithms for nonlinear semi-definite programming problems with applications," 

Journal of Global Optimization, vol. 25, pp. 141 - 155, 2003. 
[28] J. Bachrach and C. Taylor, Handbook of sensor networks. John Wiley and Sons, Inc., 2005, ch. Localization in Sensor Networks, pp. 

277-310. 

[29] L. Doherty, K. Pister, and L. E. Ghaoui, "Convex position estimation in wireless sensor networks," in 20th IEEE Conference on 

Computer Communications Societies, vol. 3, 2001, pp. 1655 -1663. 
[30] M. Lopez and G. Still, "Semi-infinite programming," European Journal of Operational Research, vol. 180, pp. 491-518, 2007. 
[31] A. Ben-Tal, L. E. Ghaoui, and A. Nemirovski, Robust Optimization. Princeton University Press, 2009. 
[32] G. Calafiore, "Random convex programs," SIAM Journal on Optimization, vol. 20, no. 6, pp. 3427 - 3464, 2010. 
[33] L. Carlone, V. Srivastava, F. Bullo, and G. C. Calafiore, "Distributed random convex programming via constraints consensus," SIAM 

Journal of Control and Optimization, Jul. 2012, submitted. 
[34] R. Reemtsen, "Some outer approximation methods for semi-infinite optimization problems," Journal of Computational and Applied 

Mathematics, vol. 53, pp. 87 - 108, 1994. 
[35] A. Mutapcic and S. Boyd, "Cutting-set methods for robust convex optimization with pessimizing oracles," Optimization Methods and 

Software, vol. 24, pp. 381- 406, 2009. 
[36] K. Yang, Y. Wu, J. Huang, X. Wang, and S. Verdu, "Distributed robust optimization for communication networks," in INEOCOM 2008. 

The 27th Conference on Computer Communications. IEEE, 2008, pp. 1157-1165. 
[37] A. Ben-Tal and A. Nemirovski, "Robust solutions of uncertain linear programs," Operations Research letters, vol. 25, pp. 1-13, 1999. 
[38] J. R. Dunham, D. G. Kelly, and J. W. Tolle, "Some experimental results concerning the expected number of pivots for solving randomly 

generated linear programs," University of North Carolina and Chapel Hill, Tech. Rep. 77-16, 1977. 
[39] A. Olshevsky and J. Tsistiklis, "Convergence speed in distributed consensus and averaging," SIAM Journal of Control and Optimization, 

vol. 48, no. 1, pp. 33-55, 2009. 

[40] R. Zamora and A. K. Srivastava, "Controls for microgrids with storage: Review, challenges and research needs," Renewable and 

Sustainable Energy Reviews, vol. 14, pp. 2009-2018, 2010. 
[41] M. Kraning, E. Chu, J. Lavaei, and S. Boyd, "Message passing for dynamic network energy management," Stanford University, Tech. 

Rep., 2012. [Online]. Available: http://www.stanford.edu/S~Sboyd/papers/pdf/decen_dyn_opt.pdf 



